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String manipulation can be made convenient within the *** language by 
implementing two functions? 

1) match [workspace; pattern] 
and 

2) construct [ format ;pmatch] . 

In this memo I describe how I think these two functions can be implemented, 

and how they might be used to express operations now conveniently denoted 

12 3 

in string manipulation languages such as COMIT , SNOBOL , and METEOR . 

Patt ern Matching 

The first argument of match , called the "workspace", is the string to 
be manipulated. This string is an ordered list of elements. An individual 
element may be a single character, such as "A", "1", ".", etc, or a sequence 
of characters, e.g., "cat", "dog", f? abq.m tf or even another string. 

The second argument of match is a pattern or prototype of a string. 
This prototype consists of a sequence of elementary patterns. We say that 
the workspace matches the prototype if there is a consecutive set of sub- 
segments of the workspace that match in turn each of the elementary patterns 
in the prototype. 
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The following types of elementary patterns should be available?; 
1* A quoted item—will match any occurrence in the workspace of an identical 
item. 

2. A string name — will match any substring of the workspace identical to 
the string named. 

3, Ann item marker— will match any n consecutive elements of the workspace, 
4* An n item marker with condition— will match n consecutive items which 

meet a specified condition* 

5. An indefinite string marker—will match a substring in the workspace of 
any length, including (zero). This elementary pattern in equivalent 
to the COMIT $. It is tentatively matched with an empty substring. If 
the remainder of the prototype does not match the remainder of the work- 
space, the $ is tentatively matched to the workspace substring contain- 
ing 1 element, etc. In other words, this marker will match the smallest 
substring of the workspace which will allow the remainder of the proto- 
type to match the remainder of the workspace* 

6. An indefinite string marker with condition specified— similar to the 
COMIT $, this marker will match the smallest substring of the workspace 
which both meets the specified condition and allows the remainder of 
the prototype to match the remainder of the workspace. 

7. End markers— the left end marker will not permit the following elementary 
pattern to match any substring which is not an initial substring of the 
workspace; similarly, a right end marker will permit the immediately 
preceding elementary pattern to match only a terminal substring of the 
workspace. 

Prototypes can consist of any non-empty sequence of these elementary 
patterns, with the obvious restriction that end markers should appear only 
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at the ends of the prototype. If more than one match can be found for any 
prototype, it is understood that the leftmost match will be taken. There 
should probably be a mode of operation in which all possible matches will 
be found by storing and returning later to any stage of the search. 

The value of match is determined by the results of the matching opera- 
tion. If no match is made, this value will be the null list, or some other 
easily identifiable flag. If a match is achieved, and the prototype consists 
of the sequence of elementary patterns P^ ? 2> ... P k > then the value will 
be a symbolic array of k items. The jth element of this array is a copy of 
the subsegment S. of the workspace which matched the elementary pattern Pj. 

Each matched subsegment can thus be referred to by a number which is 
its position in the match array. This suggests another elementary pattern 

element ; 

8. A previous match name— this will match a substring in the workspace which 
is identical to a substring matched by an elementary pattern earlier in 

the prototype. 
In addition to these numerical names, provision should be made to allow 
any matched subsegment of the workspace to be assigned directly as a value 
of a variable, as is done in SNOB0L- This could of course, be done indirectly 
by having a separate set statement and assigning to the variable a value 
which is the appropriate subsegment fgom the list of matched subsegments. 

Pro posed Notations for the Prototype 

A prototype is a sequence of elementary patterns P^, P 2 » •••> ? k * A 
prototype is bracketted by parentheses and the elementary patterns separated 
by plus signs, i.e., 
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CP X + P 2 + P 3 + • . . + P k ) 
Only % 1 or P fc (or both) may be an end marker* We will use $3 for these 
end markers (consistent with the COMW 2 notation). For the ether elementary 
patterns we will use? 

1. Quoted strings— the quoted string of the *** language, e.g., "ABC 11 or 
"#Y , <Z M etc., where ^ and 2 may be included in a string by preceding 
them with X (see section 2,6.1 of MAOM-158). 

2. String name—the string name, e.g., LXST3 as a pattern element will match 
a subsegment of the workspace which is the same as the string named L1ST3. 

3. Ana item marker— $n (again consistent with OOMIT), e.g., $1, $2, etc. 

4. An a item marker with a condition— $n/£RGC; the condition is specified 
by naming a procedure, in this case PROC, which has appropriate values 
and knows how to communicate the match or no-match condition. Arguments 
for this procedure «oay follow the procedure name. 

5. An indefinite string marker— $ (consistent with CGMIT). 

6. An indefinite string marker with a condition $/PROC where PROC is an 
appropriate procedure name. 

7. End markers— »$0. 

8. Previous match name— n where n is a number which is a previous match 
name, as in CQMIT. 

To give a permanent name to any matched substring, that is assign this 
substring as a value for a variable, precede the elementary pattern by this 
name followed by a "/ ,f , e.g., SUB/$3 will assign the three element substring 
which matches the $3 to be the value of the variable named SUB. SUB/$3 will 
not be confused with an elementary pattern consisting of the string name SUB, 
since in the latter case the "/" following SUB cannot appear. 
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qonstructin;? Mew Strings 

The function const ruc t will construct a new string, given a format 

and a symbolic array which is an output from a match. The format is a 

sequence of elementary format elements which specify items to be placed 

in the new string. An element can beg 

1» A quoted string~-e.g. , "abc" etc., this element will be placed in the 
new string in the position given* 

2« A string name— e.g., LIST 3; a copy of the string named LIST3 will be 
concatenated into the workspace. 

3. A reference number~-e.g. , 3, which refers to the subsegment of the 
original workspace matched by the third elementary pattern of the 
prototype. 

4* Any *** function of the above items, including those done for effect- 
as are LISP pseudo- functions. 
A format sequence of elementary format elements F., F 2 » . .., F.is 

bracket ted by parentheses and separated by plus signs, i.e., (F, + F 2 Hh F~ + F. ). 

Each F. is independent of any other in the format string (and may be identical 

to another). The newly constructed string is also independent of the original 

workspace . 

Conclusion 

The string manipulation features described here are intended for use 
wa^hin the *** language, and thus no special control features have been 
suggested . The control mechanisms used by COMIT, SNOBOL and METEOR can be 
built from those provided. Utilizing the two functions described here, very 
complex string manipulations may be described very concisely. Since this 
string manipulation feature is still a proposal, critical suggestions on 
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notation and features are velcooe. 
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